Scalable Classification over SQL Databases

نویسندگان

  • Surajit Chaudhuri
  • Usama M. Fayyad
  • Jeff Bernhardt
چکیده

We identify data-intensive operations that are common to classifiers and develop a middleware that decomposes and schedules these operations efficiently using a backend SQL database. Our approach has the added advantage of not requiring any specialized physical data organization. We demonstrate the scalability characteristics of our enhanced client with experiments on Microsoft SQL Server 7.0 by varying data size, number of attributes and characteristics of decision trees.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable Mining for Classification Rules in Relational Databases

Data mining is a process of discovering useful patterns (knowledge) hidden in extremely large datasets. Classification is a fundamental data mining function, and some other functions can be reduced to it. In this paper we propose a novel classification algorithm (classifier) called MIND (MINing in Databases). MIND can be phrased in such a way that its implementation is very easy using the exten...

متن کامل

Scalable Numerical SPARQL Queries over Relational Databases

We present an approach for scalable processing of SPARQL queries to RDF views of numerical data stored in relational databases (RDBs). Such queries include numerical expressions, inequalities, comparisons, etc. inside FILTERs. We call such FILTERs numerical expressions and the queries numerical SPARQL queries. For scalable execution of numerical SPARQL queries over RDBs, numerical operators sho...

متن کامل

ConQuer: A System for Efficient Querying Over Inconsistent Databases

Although integrity constraints have long been used to maintain data consistency, there are situations in which they may not be enforced or satisfied. In this demo, we showcase ConQuer, a system for efficient and scalable answering of SQL queries on databases that may violate a set of constraints. ConQuer permits users to postulate a set of key constraints together with their queries. The system...

متن کامل

Database Implementation of a Model-Free Classifier

Most methods proposed so far for classification of high-dimensional data are memory-based and obtain a model of the data classes through training before actually performing any classification. As a result, these methods are ineffective on (a) very large datasets stored in databases or datawarehouses, (b) datawhose partitioning into classes cannot be captured by global models and is sensitive to...

متن کامل

NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management

One of the key advances in resolving the “big-data” problem has been the emergence of an alternative database technology. Today, classic RDBMS are complemented by a rich set of alternative Data Management Systems (DMS) specially designed to handle the volume, variety, velocity and variability of Big Data collections; these DMS include NoSQL, NewSQL and Search-based systems. NewSQL is a class of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999